Semantic-Based K-Means Clustering for IMDB Top 100 Movies

نویسندگان

چکیده

Textual documents are growing rapidly through the internet in today’s modern technology era. Electronic structured databases archive offline and online documents, e-mails, webpages, blog social network posts. Without appropriate ranking demand clustering when there is classification without any specifics, it quite difficult to retain access these documents. K-means one of methods that frequently used for clustering. In terms determining proximity meaning or semantics between data, distance-based method still has flaws. To get around this issue, semantic similarity can be estimated by measuring level objects a cluster. This research provides based on similarity. The approach carried out defining document synopses from IMDB Wikipedia using NLTK dictionary, we provide semantic-based assesses not only data represented as vector space model with TFIDF, but also Precision, recall, F-measure, demonstrate how well technique works experimental findings top 100 movies datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Persistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm

Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...

متن کامل

Comparing Model-based Versus K-means Clustering for the Planar Shapes

‎In some fields‎, ‎there is an interest in distinguishing different geometrical objects from each other‎. ‎A field of research that studies the objects from a statistical point of view‎, ‎provided they are‎ ‎invariant under translation‎, ‎rotation and scaling effects‎, ‎is known as the statistical shape analysis‎. ‎Having some objects that are registered using key points on the outline...

متن کامل

Graph based k-means clustering

An original approach to cluster multi-component data sets is proposed that includes an estimation of the number of clusters. Using Prim’s algorithm to construct a minimal spanning tree (MST) we show that, under the assumption that the vertices are approximately distributed according to a spatial homogeneous Poisson process, the number of clusters can be accurately estimated by thresholding the ...

متن کامل

persistent k-means: stable data clustering algorithm based on k-means algorithm

identifying clusters or clustering is an important aspect of data analysis. it is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. it is a main task of exploratory data mining, and a common technique for statistical data analysis this paper proposed an improved version of k-means algorithm, namely persistent k...

متن کامل

Enhanced Clustering Based on K-means Clustering Algorithm and Proposed Genetic Algorithm with K-means Clustering

-In this paper targeted a variety of techniques, tactics and distinctive areas of the studies that are useful and marked because the crucial discipline of information mining technologies. The overall purpose of the system of statistics mining is to extract beneficial facts from a large set of information and changing it right into a shape that is comprehensible for in addition use. Clustering i...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of applied science and technology trends

سال: 2022

ISSN: ['2708-0757']

DOI: https://doi.org/10.38094/jastt302138